Memoization in Top-Down Parsing

نویسنده

  • Mark Johnson
چکیده

In a paper published in this journal, Norvig (1991) pointed out that memoization of a top-down recognizer program produces a program that behaves similiarly to a chart parser. This is not surprising to anyone familiar with logic-programming approaches to natural language processing (NLP). For example, the Earley deduction proof procedure is essentially a memoizing version of the top-down selected literal deletion (SLD) proof procedure employed by Prolog. Pereira and Warren (1983) showed that the steps of the Earley Deduction proof procedure proving the well-formedness of a string S from the standard 'top-down' definite clause grammar (DCG) axiomatization of a contextfree grammar (CFG) G correspond directly to those of Earley's algorithm recognizing S using G. Yet as Norvig notes in passing, using his approach the resulting parsers in general fail to terminate on left-recursive grammars, even with memoization. The goal of this paper is to discover why this is the case and present a functional formalization of memoized top-down parsing for which this is not so. Specifically, I show how to formulate top-down parsers in a 'continuation-passing style,' which incrementally enumerates the right string positions of a category, rather than returning a set of such positions as a single value. This permits a type of memoization not described to my knowledge in the context of functional programming before. This kind of memoization is akin to that used in logic programming, and yields terminating parsers even in the face of left recursion. In this paper, algorithms are expressed in the Scheme programming language (Rees and Clinger 1991). Scheme was chosen because it is a popular, widely known language that many readers find easy to understand. Scheme's 'first-class' treatment of functions simplifies the functional abstraction used in this paper, but the basic approach can be implemented in more conventional languages as well. Admittedly elegance is a matter of taste, but personally I find the functional specification of CFGs described here as simple and elegant as the more widely known logical (DCG) formalization, and I hope that the presentation of working code will encourage readers to experiment with the ideas described here and in more substantial works such as Leermakers (1993). In fact, my own observations suggest that with minor modifications (such as the use of integers rather than lists to indicate string positions, and vectors indexed by string positions rather than lists in the memoization routines) an extremely efficient chart parser can be obtained from the code presented here. Ideas related to the ones discussed here have been presented on numerous occasions. Almost 20 years ago Shiel (1976) noticed the relationship between chart parsing and top-down parsing. Leermakers (1993) presents a more abstract discussion of the functional treatment of parsing, and avoids the left-recursion problem for memoized

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Technical Correspondence: Techniques For Automatic Memoization With Applications To Context-Free Parsing

It is shown that a process similar to Earley's algorithm can be generated by a simple top-down backtracking parser, when augmented by automatic memoization. The memoized parser has the same complexity as Earley's algorithm, but parses constituents in a different order. Techniques for deriving memo functions are described, with a complete implementation in Common Lisp, and an outline of a macro-...

متن کامل

Memoization of Top Down Parsing

In a paper published in this journal, Norvig (1991) pointed out that memoization of a topdown recognizer program produces a program that behaves similiarly to a chart parser. This is not surprising to anyone familiar with logic-programming approaches to NLP. For example, the Earley deduction proof procedure is essentially a memoizing version of the top-down SLD proof procedure employed by Prolo...

متن کامل

From EBNF to PEG

Parsing Expression Grammar (PEG) encodes a recursive-descent parser with limited backtracking. The parser has many useful properties, and with the use of memoization, it works in a linear time. In its appearance, PEG is almost identical to a grammar in Extended Backus-Naur Form (EBNF), but usually defines a different language. However, in some cases only minor typographical changes are sufficie...

متن کامل

Modular and Efficient Top-Down Parsing for Ambiguous Left-Recursive Grammars

In functional and logic programming, parsers can be built as modular executable specifications of grammars, using parser combinators and definite clause grammars respectively. These techniques are based on top-down backtracking search. Commonly used implementations are inefficient for ambiguous languages, cannot accommodate left-recursive grammars, and require exponential space to represent par...

متن کامل

A New Perspective of Statistical Modeling by PRISM

PRISM was born in 1997 as a symbolic statistical modeling language to facilitate modeling complex systems governed by rules and probabilities [Sato and Kameya, 1997]. It was the first programming language with EM learning ability and has been shown to be able to cover popular symbolic statistical models such as Bayesian networks, HMMs (hidden Markov models) and PCFGs (probabilistic context free...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Linguistics

دوره 21  شماره 

صفحات  -

تاریخ انتشار 1995